NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Value of Out-of-distribution Data

De_Silva, Ashwin; Ramesh, Rahul; Priebe, Carey E; Chaudhari, Pratik; Vogelstein, Joshua T (December 2025, NeurIPS)

Free, publicly-accessible full text available December 1, 2026
Prospective Learning: Learning for a Dynamic Future

De_Silva, Ashwin; Ramesh, Rahul; Yang, Rubing; Yu, Siyu; Vogelstein, Joshua T; Chaudhari, Pratik (January 2025, arxiv.org)

Free, publicly-accessible full text available January 30, 2026
The training process of many deep networks explores the same low-dimensional manifold

https://doi.org/10.1073/pnas.2310002121

Mao, Jialin; Griniasty, Itay; Teoh, Han Kheng; Ramesh, Rahul; Yang, Rubing; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik (March 2024, Proceedings of the National Academy of Sciences)

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
more » « less
Full Text Available
How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks

Ramesh, Rahul; Mikail Khona; Robert P. Dick; Hidenori Tanaka; Ekdeep Singh Lubana. (November 2023, Proc. NeurIPS Wkshp. on Symmetry and Geometry in Neural Representations)

Transformers trained on huge text corpora exhibit a remarkable set of capabilities. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. In this work, we train Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities and show that: (1) Transformers generalize to exponentially or even combinatorially many functions not seen in the training data; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions; (3) the training data has a significant impact on the model’s ability to compose functions (4) Attention layers in the latter half of the model seem critical to compositionality.
more » « less
Full Text Available
How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks

Ramesh, Rahul; Mikail Khona; Robert P. Dick; Hidenori Tanaka; Ekdeep Singh Lubana (November 2023, arXiv preprint)

Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) Autoregressive Transformers can learn compositional structures from the training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) the training data has a significant impact on the model’s ability to compose unseen combinations of functions; and (4) the attention layers in the latter half of the model are critical to compositionality
more » « less
Full Text Available
The Value of Out-of-Distribution Data

'Silva, Ashwin De; Ramesh, Rahul; Priebe, Carey E; Chaudhari, Pratik; Vogelstein, Joshua T' (July 2023, arxiv.org)

Full Text Available
A Picture of the Space of Typical Learnable Tasks

Ramesh, Rahul; Mao, Jialin; Griniasty, Itay; Yang, Rubing; Teoh, Han Kheng; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik (May 2023, Proceedings of the 40 th International Conference on Machine Learning)

Full Text Available
A Picture of the Space of Typical Learnable Tasks

Ramesh, Rahul; Mao, Jialin; Griniasty, Itay; Yan, Rubing; Teoh, Han Kheng; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik (May 2023, Proceedings of the 40th International Conference on Machine Learning)

We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and con- trastive learning. We shed light on the following phenomena that relate to the structure of the space of tasks: (1) the manifold of probabilistic models trained on different tasks using different represen- tation learning methods is effectively low-dimen- sional; (2) supervised learning on one task results in a surprising amount of progress even on seem- ingly dissimilar tasks; progress on other tasks is larger if the training task has diverse classes; (3) the structure of the space of tasks indicated by our analysis is consistent with parts of the Word- net phylogenetic tree; (4) episodic meta-learning algorithms and supervised learning traverse differ- ent trajectories during training but they fit similar models eventually; (5) contrastive and semi-su- pervised learning methods traverse trajectories similar to those of supervised learning. We use classification tasks constructed from the CIFAR- 10 and Imagenet datasets to study these phenom- ena. Code is available at https://github.com/grasp- lyrl/picture of space of tasks.
more » « less
Full Text Available
Design and Characterization of the Dual Independent Swirl Combustor Facility (DISCo)

https://doi.org/10.2514/6.2021-3479

Ramesh, Rahul; Obidov, Sanjar; Paredes, Juan; S. Bernstein, Dennis; Gamba, Mirko (August 2021, AIAA Propulsion and Energy Forum)
null (Ed.)
Description of the DISCo facility and adaptive control experiments
more » « less
Full Text Available
Prospective Learning: Principled Extrapolation to the Future

De_Silva, Ashwin; Ramesh, Rahul; Ungar, Lyle; Shuler, Marshall Hussain; Cowan, Noah J; Platt, Michael; Li, Chen; Isik, Leyla; Roh, Seung-Eon; Charles, Adam; et al (August 2023, Conference on Lifelong Learning Agents (CoLLAs))

Full Text Available

Search for: All records